I am an associate professor in the Departments of Industrial Engineering & Management Sciences and Computer Science at Northwestern University. I am also with the Centers for Deep Learning and Optimization & Statistical Learning.
                    The long-term goal of my research is to develop a new generation of data-driven decision-making methods, theory, and systems, which tailor artificial intelligence towards addressing societal challenges. To this end, my research aims at: 
                    
                - making autonomous learning agents more efficient, both computationally and statistically, in a principled manner to enable their emerging applications;
- designing and optimizing societal-scale multi-agent systems, especially those involving cooperation and/or competition among humans and/or robots.
Selected Recent Papers [Overview] [Conference] [Journal] [Citation]
| Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents International Conference on Machine Learning (ICML), 2024 [Arxiv] [Demo] [GitHub] | 
| Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer Advances in Neural Information Processing Systems (NeurIPS), 2024 [Arxiv] | 
| Maximize to Explore: A Single Objective Fusing Estimation, Planning, and Exploration Advances in Neural Information Processing Systems (NeurIPS), 2023 (spotlight) [Arxiv] | 
| Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency International Conference on Learning Representations (ICLR), 2023 [Arxiv] | 
| Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency International Conference on Machine Learning (ICML), 2022 [Arxiv] | 
| A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic (alphabetical) SIAM Journal on Optimization (SIOPT), 2022 [Arxiv] | 
| Is Pessimism Provably Efficient for Offline RL? International Conference on Machine Learning (ICML), 2021 Mathematics of Operations Research (MOR), 2024 [Arxiv] | 
| Provably Efficient Causal Reinforcement Learning with Confounded Observational Data Advances in Neural Information Processing Systems (NeurIPS), 2021 [Arxiv] | 
| Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory Advances in Neural Information Processing Systems (NeurIPS), 2020 (oral) [Arxiv] | 
| Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret Advances in Neural Information Processing Systems (NeurIPS), 2020 (spotlight) [Arxiv] | 
| Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework Advances in Neural Information Processing Systems (NeurIPS), 2020 [Arxiv] [Demo] | 
| Provably Efficient Exploration in Policy Optimization International Conference on Machine Learning (ICML), 2020 [Arxiv] | 
| Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium Annual Conference on Learning Theory (COLT), 2020 Mathematics of Operations Research (MOR), 2023 [Arxiv] | 
| Provably Efficient Reinforcement Learning with Linear Function Approximation Annual Conference on Learning Theory (COLT), 2020 Mathematics of Operations Research (MOR), 2023 [Arxiv] | 
| Neural Policy Gradient Methods: Global Optimality and Rates of Convergence International Conference on Learning Representations (ICLR), 2020 [Arxiv] | 
| Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy Advances in Neural Information Processing Systems (NeurIPS), 2019 [Arxiv] | 
| Neural Temporal-Difference and Q-Learning Provably Converge to Global Optima Advances in Neural Information Processing Systems (NeurIPS), 2019 Mathematics of Operations Research (MOR), 2024 [Arxiv] | 
| A Theoretical Analysis of Deep Q-Learning (alphabetical) Submitted, 2020 [Arxiv] | 
|  | 
